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ABSTRACT 

Motivation: Phylogenies are increasingly used in ail fields of medical 
and biological research. Moreover, because of the next-generation 
sequencing revolution, datasets used for conducting phylogenetic 
analyses grow at an unprecedented pace. RAxML (Randomized 
Axelerated Maximum Likelihood) is a popular program for phylogen- 
etic analyses of large datasets under maximum likelihood. Since the 
last RAxML paper in 2006, it has been continuously maintained and 
extended to accommodate the increasingly growing input datasets 
and to serve the needs of the user community. 
Results: I present some of the most notable new features and 
extensions of RAxML, such as a substantial extension of substitution 
models and supported data types, the introduction of SSE3, AVX and 
AVX2 vector intrinsics, techniques for reducing the memory require- 
ments of the code and a plethora of operations for conducting post- 
analyses on sets of trees. In addition, an up-to-date 50-page user 
manual covering all new RAxML options is available. 
Availability and implementation: The code is available under GNU 
GPL at https://github.com/stamatak/standard-RAxML. 
Contact: alexandros.stamatakis@h-its.org 

Supplementary information: Supplementary data are available at 
Bioinformatics online. 
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1 INTRODUCTION 

RAxML (Randomized Axelerated Maximum Lilcelihood) is a 
popular program for phylogenetic analysis of large datasets 
under maximum likelihood. Its major strength is a fast maximum 
likelihood tree search algorithm that returns trees with good 
likelihood scores. Since the last RAxML paper (Stamatakis, 
2006), it has been continuously maintained and extended to ac- 
commodate the increasingly growing input datasets and to serve 
the needs of the user community. In the following, I will present 
some of the most notable new features and extensions of RAxML. 

2 NEW FEATURES 

2.1 Bootstrapping and support values 

RAxML offers four different ways to obtain bootstrap support. 
It implements the standard non-parametric bootstrap and also 
the so-called rapid bootstrap (Stamatakis et al, 2008), which is a 



standard bootstrap search that relies on algorithmic shortcuts 
and approximations to speed up the search process. 

It also offers an option to calculate the so-called SH-like 
support values (Guindon et al., 2010). I recently implemented 
a method that allows for computing RELL (Resampling 
Estimated Log Likelihoods) bootstrap support as described by 
Minh et al. (2013). 

Apart from this, RAxML also offers a so-called bootstopping 
option (Pattengale et al., 2010). When this option is used, 
RAxML will automatically determine how many bootstrap rep- 
licates are required to obtain stable support values. 

2.2 Models and data types 

Apart from DNA and protein data, RAxML now also supports 
binary, inulti-state morphological and RNA secondary structure 
data. It can correct for ascertainment bias (Lewis, 2001) for all of 
the above data types. This might be useful not only for inorpho- 
logical data matrices that only contain variable sites but also for 
alignments of SNPs. 

The nuinber of available protein substitution models has been 
significantly extended and comprises a general time reversible 
(GTR) model, as well as the computationally more complex 
LG4M and LG4X models (Le et al, 2012). RAxML can also 
automatically determine the best-scoring protein substitution 
model. 

Finally, a new option for conducting a maxiinuin likelihood 
estimate of the base frequencies has become available. 

2.3 Parallel versions 

R/^ML offers a fine -grain parallehzation of the likelihood func- 
tion for multi-core systems via the PThreads-based version and a 
coarse-grain parallehzation of independent tree searches via MPI 
(Message Passing Interface). It also supports coarse-grain/fine- 
grain parallelism via the hybrid MPI/PThreads version (Pfeiffer 
and Stamatakis, 2010). 

Note that, for extremely large analyses on supercomputers, 
using the dedicated sister program ExaML [Exascale Maximum 
Likelihood (Stamatakis and Aberer, 2013)] is recommended. 

2.4 Post-analysis of trees 

R/bcML offers a plethora of post-analysis functions for sets of 
trees. Apart froin standard statistical significance tests, it offers 
efficient (and partially parallelized) operations for computing 
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Robinson-Foulds distances, as well as extended majority rule, 
majority rule and strict consensus trees (Aberer et al, 2010). 

Beyond this, it implements a method for identifying the so- 
called rogue taxa (Pattengale et al., 20II), and I recently imple- 
mented options for calculating the TC (Tree Certainty) and IC 
(Internode Certainty) measures as introduced by Salichos and 
Rokas (2013). 

Finally, there is the new plausibility checker option (Dao et al, 
2013) that allows computing the RF distances between a huge phyl- 
ogeny with tens of thousands of taxa and several smaller more 
accurate reference phylogenies that contain a strict subset of the 
taxa in the huge tree. This option can be used to automatically 
assess the quality of huge trees that can not be inspected by eye. 

2.5 Analyzing next-generation sequencing data 

RAxML offers two algorithms for preparing and analyzing next- 
generation sequencing data. A sliding-window approach (unpub- 
hshed) is available to assess which regions of a gene (e.g. 16S) 
exhibit strong and stable phylogenetic signal to support decisions 
about which regions to amplify. Apart from that, RAxML also 
implements parsimony and maximum likelihood flavors of the 
evolutionary placement algorithm [EPA (Berger et al., 2011)] 
that places short reads into a given reference phylogeny obtained 
from full-length sequences to determine the evolutionary origin 
of the reads. It also offers placement support statistics for those 
reads by calculating likelihood weights. This option can also be 
used to place fossils into a given phylogeny (Berger and 
Stamatakis, 2010) or to insert different outgroups into the tree 
a posteriori, that is, after the inference of the ingroup phylogeny. 

2.6 Vector intrinsics 

RAxML uses manually inserted and optimized x86 vector intrin- 
sics to accelerate the parsimony and likelihood calculations. 
It supports SSE3, AVX and AVX2 (using fused multiply-add 
instructions) intrinsics. For a small single-gene DNA alignment 
using the T model of rate heterogeneity, the unvectorized version 
of RAxML requires 111.5 s, the SSE3 version 84.4 s and the 
AVX version 66.22 s to complete a simple tree search on an 
Intel 17-2620 M core running at 2.70 GHz under Ubuntu Linux. 

The differences between AVX and AVX2 are less pronounced 
and are typically below 5% run time improvement. 

2.7 Saving memory 

Because memory shortage is becoming an issue due to the grow- 
ing dataset sizes, RAxML implements an option for reducing 
memory footprints and potentially run times on large phyloge- 
nomic datasets with missing data. The memory savings are pro- 
portional to the amount of missing data in the alignment 
(Izquierdo-Carrasco et al., 2011) 

2.8 Miscellaneous new options 

RAxML offers options to conduct fast and more superficial tree 
searches on datasets with tens of thousands of taxa. It can also 
compute marginal ancestral states and offers an algorithm for 
rooting trees. Furthermore, it implements a sequential, 
PThreads-parallelized and MPI-parallelized algorithm for com- 
puting all quartets or a subset of quartets for a given ahgnment. 



3 USER SUPPORT AND FUTURE WORK 

User support is provided via the RAxML Google group 
at: https://groups.google.com/forum/?hl=en#!forum/raxml. The 
RAxML source code contains a comprehensive manual and 
there is a step-by-step tutorial with some basic commands avail- 
able at http://www.exelixis-lab.org/web/software/raxml/hands_ 
on.html. Further resources are available via the RAxML soft- 
ware page at http://www.exelixis-lab.org/web/software/raxml/ 

Future work includes the continued maintenance of RAxML, 
the adaptation to novel computer architectures and the implemen- 
tation of novel models and datatypes, in particular codon models. 
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